AAAI.2023 - Humans and AI

Total: 26

#1 The Perils of Trial-and-Error Reward Design: Misdesign through Overfitting and Invalid Task Specifications [PDF] [Copy] [Kimi]

Authors: Serena Booth ; W. Bradley Knox ; Julie Shah ; Scott Niekum ; Peter Stone ; Alessandro Allievi

In reinforcement learning (RL), a reward function that aligns exactly with a task's true performance metric is often necessarily sparse. For example, a true task metric might encode a reward of 1 upon success and 0 otherwise. The sparsity of these true task metrics can make them hard to learn from, so in practice they are often replaced with alternative dense reward functions. These dense reward functions are typically designed by experts through an ad hoc process of trial and error. In this process, experts manually search for a reward function that improves performance with respect to the task metric while also enabling an RL algorithm to learn faster. This process raises the question of whether the same reward function is optimal for all algorithms, i.e., whether the reward function can be overfit to a particular algorithm. In this paper, we study the consequences of this wide yet unexamined practice of trial-and-error reward design. We first conduct computational experiments that confirm that reward functions can be overfit to learning algorithms and their hyperparameters. We then conduct a controlled observation study which emulates expert practitioners' typical experiences of reward design, in which we similarly find evidence of reward function overfitting. We also find that experts' typical approach to reward design---of adopting a myopic strategy and weighing the relative goodness of each state-action pair---leads to misdesign through invalid task specifications, since RL algorithms use cumulative reward rather than rewards for individual state-action pairs as an optimization target. Code, data: github.com/serenabooth/reward-design-perils

#2 The Value of AI Guidance in Human Examination of Synthetically-Generated Faces [PDF] [Copy] [Kimi]

Authors: Aidan Boyd ; Patrick Tinsley ; Kevin Bowyer ; Adam Czajka

Face image synthesis has progressed beyond the point at which humans can effectively distinguish authentic faces from synthetically-generated ones. Recently developed synthetic face image detectors boast ``better-than-human'' discriminative ability, especially those guided by human perceptual intelligence during the model's training process. In this paper, we investigate whether these human-guided synthetic face detectors can assist non-expert human operators in the task of synthetic image detection when compared to models trained without human-guidance. We conducted a large-scale experiment with more than 1,560 subjects classifying whether an image shows an authentic or synthetically-generated face, and annotating regions supporting their decisions. In total, 56,015 annotations across 3,780 unique face images were collected. All subjects first examined samples without any AI support, followed by samples given (a) the AI's decision (``synthetic'' or ``authentic''), (b) class activation maps illustrating where the model deems salient for its decision, or (c) both the AI's decision and AI's saliency map. Synthetic faces were generated with six modern Generative Adversarial Networks. Interesting observations from this experiment include: (1) models trained with human-guidance, which are also more accurate in our experiments, offer better support to human examination of face images when compared to models trained traditionally using cross-entropy loss, (2) binary decisions presented to humans results in their better performance than when saliency maps are presented, (3) understanding the AI's accuracy helps humans to increase trust in a given model and thus increase their overall accuracy. This work demonstrates that although humans supported by machines achieve better-than-random accuracy of synthetic face detection, the approaches of supplying humans with AI support and of building trust are key factors determining high effectiveness of the human-AI tandem.

#3 Teaching to Learn: Sequential Teaching of Learners with Internal States [PDF] [Copy] [Kimi]

Authors: Mustafa Mert Çelikok ; Pierre-Alexandre Murena ; Samuel Kaski

In sequential machine teaching, a teacher’s objective is to provide the optimal sequence of inputs to sequential learners in order to guide them towards the best model. However, this teaching objective considers a restricted class of learners with fixed inductive biases. In this paper, we extend the machine teaching framework to learners that can improve their inductive biases, represented as latent internal states, in order to generalize to new datasets. We introduce a novel framework in which learners’ inductive biases may change with the teaching interaction, which affects the learning performance in future tasks. In order to teach such learners, we propose a multi-objective control approach that takes the future performance of the learner after teaching into account. This framework provides tools for modelling learners with internal states, humans and meta-learning algorithms alike. Furthermore, we distinguish manipulative teaching, which can be done by effectively hiding data and also used for indoctrination, from teaching to learn which aims to help the learner become better at learning from new datasets in the absence of a teacher. Our empirical results demonstrate that our framework is able to reduce the number of required tasks for online meta-learning, and increases independent learning performance of simulated human users in future tasks.

#4 Interactive Concept Bottleneck Models [PDF] [Copy] [Kimi]

Authors: Kushal Chauhan ; Rishabh Tiwari ; Jan Freyberg ; Pradeep Shenoy ; Krishnamurthy Dvijotham

Concept bottleneck models (CBMs) are interpretable neural networks that first predict labels for human-interpretable concepts relevant to the prediction task, and then predict the final label based on the concept label predictions. We extend CBMs to interactive prediction settings where the model can query a human collaborator for the label to some concepts. We develop an interaction policy that, at prediction time, chooses which concepts to request a label for so as to maximally improve the final prediction. We demonstrate that a simple policy combining concept prediction uncertainty and influence of the concept on the final prediction achieves strong performance and outperforms static approaches as well as active feature acquisition methods proposed in the literature. We show that the interactive CBM can achieve accuracy gains of 5-10% with only 5 interactions over competitive baselines on the Caltech-UCSD Birds, CheXpert and OAI datasets.

#5 Local Justice and Machine Learning: Modeling and Inferring Dynamic Ethical Preferences toward Allocations [PDF] [Copy] [Kimi]

Authors: Violet (Xinying) Chen ; Joshua Williams ; Derek Leben ; Hoda Heidari

We consider a setting in which a social planner has to make a sequence of decisions to allocate scarce resources in a high-stakes domain. Our goal is to understand stakeholders' dynamic moral preferences toward such allocational policies. In particular, we evaluate the sensitivity of moral preferences to the history of allocations and their perceived future impact on various socially salient groups. We propose a mathematical model to capture and infer such dynamic moral preferences. We illustrate our model through small-scale human-subject experiments focused on the allocation of scarce medical resource distributions during a hypothetical viral epidemic. We observe that participants' preferences are indeed history- and impact-dependent. Additionally, our preliminary experimental results reveal intriguing patterns specific to medical resources---a topic that is particularly salient against the backdrop of the global covid-19 pandemic.

#6 Extracting Semantic-Dynamic Features for Long-Term Stable Brain Computer Interface [PDF] [Copy] [Kimi]

Authors: Tao Fang ; Qian Zheng ; Yu Qi ; Gang Pan

Brain-computer Interface (BCI) builds a neural signal to the motor command pathway, which is a prerequisite for the realization of neural prosthetics. However, a long-term stable BCI suffers from the neural data drift across days while retraining the BCI decoder is expensive and restricts its application scenarios. Recent solutions of neural signal recalibration treat the continuous neural signals as discrete, which is less effective in temporal feature extraction. Inspired by the observation from biologists that low-dimensional dynamics could describe high-dimensional neural signals, we model the underlying neural dynamics and propose a semantic-dynamic feature that represents the semantics and dynamics in a shared feature space facilitating the BCI recalibration. Besides, we present the joint distribution alignment instead of the common used marginal alignment strategy, dealing with the various complex changes in neural data distribution. Our recalibration approach achieves state-of-the-art performance on the real neural data of two monkeys in both classification and regression tasks. Our approach is also evaluated on a simulated dataset, which indicates its robustness in dealing with various common causes of neural signal instability.

#7 Moral Machine or Tyranny of the Majority? [PDF] [Copy] [Kimi]

Authors: Michael Feffer ; Hoda Heidari ; Zachary C. Lipton

With artificial intelligence systems increasingly applied in consequential domains, researchers have begun to ask how AI systems ought to act in ethically charged situations where even humans lack consensus. In the Moral Machine project, researchers crowdsourced answers to "Trolley Problems" concerning autonomous vehicles. Subsequently, Noothigattu et al. (2018) proposed inferring linear functions that approximate each individual's preferences and aggregating these linear models by averaging parameters across the population. In this paper, we examine this averaging mechanism, focusing on fairness concerns and strategic effects. We investigate a simple setting where the population consists of two groups, the minority constitutes an α < 0.5 share of the population, and within-group preferences are homogeneous. Focusing on the fraction of contested cases where the minority group prevails, we make the following observations: (a) even when all parties report their preferences truthfully, the fraction of disputes where the minority prevails is less than proportionate in α; (b) the degree of sub-proportionality grows more severe as the level of disagreement between the groups increases; (c) when parties report preferences strategically, pure strategy equilibria do not always exist; and (d) whenever a pure strategy equilibrium exists, the majority group prevails 100% of the time. These findings raise concerns about stability and fairness of averaging as a mechanism for aggregating diverging voices. Finally, we discuss alternatives, including randomized dictatorship and median-based mechanisms.

#8 The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types [PDF] [Copy] [Kimi]

Authors: Gaurav R. Ghosal ; Matthew Zurek ; Daniel S. Brown ; Anca D. Dragan

When inferring reward functions from human behavior (be it demonstrations, comparisons, physical corrections, or e-stops), it has proven useful to model the human as making noisy-rational choices, with a "rationality coefficient" capturing how much noise or entropy we expect to see in the human behavior. Prior work typically sets the rationality level to a constant value, regardless of the type, or quality, of human feedback. However, in many settings, giving one type of feedback (e.g. a demonstration) may be much more difficult than a different type of feedback (e.g. answering a comparison query). Thus, we expect to see more or less noise depending on the type of human feedback. In this work, we advocate that grounding the rationality coefficient in real data for each feedback type, rather than assuming a default value, has a significant positive effect on reward learning. We test this in both simulated experiments and in a user study with real human feedback. We find that overestimating human rationality can have dire effects on reward learning accuracy and regret. We also find that fitting the rationality coefficient to human data enables better reward learning, even when the human deviates significantly from the noisy-rational choice model due to systematic biases. Further, we find that the rationality level affects the informativeness of each feedback type: surprisingly, demonstrations are not always the most informative---when the human acts very suboptimally, comparisons actually become more informative, even when the rationality level is the same for both. Ultimately, our results emphasize the importance and advantage of paying attention to the assumed human-rationality-level, especially when agents actively learn from multiple types of human feedback.

#9 The Role of Heuristics and Biases during Complex Choices with an AI Teammate [PDF] [Copy] [Kimi]

Authors: Nikolos Gurney ; John H. Miller ; David V. Pynadath

Behavioral scientists have classically documented aversion to algorithmic decision aids, from simple linear models to AI. Sentiment, however, is changing and possibly accelerating AI helper usage. AI assistance is, arguably, most valuable when humans must make complex choices. We argue that classic experimental methods used to study heuristics and biases are insufficient for studying complex choices made with AI helpers. We adapted an experimental paradigm designed for studying complex choices in such contexts. We show that framing and anchoring effects impact how people work with an AI helper and are predictive of choice outcomes. The evidence suggests that some participants, particularly those in a loss frame, put too much faith in the AI helper and experienced worse choice outcomes by doing so. The paradigm also generates computational modeling-friendly data allowing future studies of human-AI decision making.

#10 Learning to Defer with Limited Expert Predictions [PDF] [Copy] [Kimi]

Authors: Patrick Hemmer ; Lukas Thede ; Michael Vössing ; Johannes Jakubik ; Niklas Kühl

Recent research suggests that combining AI models with a human expert can exceed the performance of either alone. The combination of their capabilities is often realized by learning to defer algorithms that enable the AI to learn to decide whether to make a prediction for a particular instance or defer it to the human expert. However, to accurately learn which instances should be deferred to the human expert, a large number of expert predictions that accurately reflect the expert's capabilities are required—in addition to the ground truth labels needed to train the AI. This requirement shared by many learning to defer algorithms hinders their adoption in scenarios where the responsible expert regularly changes or where acquiring a sufficient number of expert predictions is costly. In this paper, we propose a three-step approach to reduce the number of expert predictions required to train learning to defer algorithms. It encompasses (1) the training of an embedding model with ground truth labels to generate feature representations that serve as a basis for (2) the training of an expertise predictor model to approximate the expert's capabilities. (3) The expertise predictor generates artificial expert predictions for instances not yet labeled by the expert, which are required by the learning to defer algorithms. We evaluate our approach on two public datasets. One with "synthetically" generated human experts and another from the medical domain containing real-world radiologists' predictions. Our experiments show that the approach allows the training of various learning to defer algorithms with a minimal number of human expert predictions. Furthermore, we demonstrate that even a small number of expert predictions per class is sufficient for these algorithms to exceed the performance the AI and the human expert can achieve individually.

#11 SWL-Adapt: An Unsupervised Domain Adaptation Model with Sample Weight Learning for Cross-User Wearable Human Activity Recognition [PDF] [Copy] [Kimi]

Authors: Rong Hu ; Ling Chen ; Shenghuan Miao ; Xing Tang

In practice, Wearable Human Activity Recognition (WHAR) models usually face performance degradation on the new user due to user variance. Unsupervised domain adaptation (UDA) becomes the natural solution to cross-user WHAR under annotation scarcity. Existing UDA models usually align samples across domains without differentiation, which ignores the difference among samples. In this paper, we propose an unsupervised domain adaptation model with sample weight learning (SWL-Adapt) for cross-user WHAR. SWL-Adapt calculates sample weights according to the classification loss and domain discrimination loss of each sample with a parameterized network. We introduce the meta-optimization based update rule to learn this network end-to-end, which is guided by meta-classification loss on the selected pseudo-labeled target samples. Therefore, this network can fit a weighting function according to the cross-user WHAR task at hand, which is superior to existing sample differentiation rules fixed for special scenarios. Extensive experiments on three public WHAR datasets demonstrate that SWL-Adapt achieves the state-of-the-art performance on the cross-user WHAR task, outperforming the best baseline by an average of 3.1% and 5.3% in accuracy and macro F1 score, respectively.

#12 Incentive-Boosted Federated Crowdsourcing [PDF] [Copy] [Kimi]

Authors: Xiangping Kang ; Guoxian Yu ; Jun Wang ; Wei Guo ; Carlotta Domeniconi ; Jinglin Zhang

Crowdsourcing is a favorable computing paradigm for processing computer-hard tasks by harnessing human intelligence. However, generic crowdsourcing systems may lead to privacy-leakage through the sharing of worker data. To tackle this problem, we propose a novel approach, called iFedCrowd (incentive-boosted Federated Crowdsourcing), to manage the privacy and quality of crowdsourcing projects. iFedCrowd allows participants to locally process sensitive data and only upload encrypted training models, and then aggregates the model parameters to build a shared server model to protect data privacy. To motivate workers to build a high-quality global model in an efficacy way, we introduce an incentive mechanism that encourages workers to constantly collect fresh data to train accurate client models and boosts the global model training. We model the incentive-based interaction between the crowdsourcing platform and participating workers as a Stackelberg game, in which each side maximizes its own profit. We derive the Nash Equilibrium of the game to find the optimal solutions for the two sides. Experimental results confirm that iFedCrowd can complete secure crowdsourcing projects with high quality and efficiency.

#13 Towards Voice Reconstruction from EEG during Imagined Speech [PDF] [Copy] [Kimi]

Authors: Young-Eun Lee ; Seo-Hyun Lee ; Sang-Ho Kim ; Seong-Whan Lee

Translating imagined speech from human brain activity into voice is a challenging and absorbing research issue that can provide new means of human communication via brain signals. Efforts to reconstruct speech from brain activity have shown their potential using invasive measures of spoken speech data, but have faced challenges in reconstructing imagined speech. In this paper, we propose NeuroTalk, which converts non-invasive brain signals of imagined speech into the user's own voice. Our model was trained with spoken speech EEG which was generalized to adapt to the domain of imagined speech, thus allowing natural correspondence between the imagined speech and the voice as a ground truth. In our framework, an automatic speech recognition decoder contributed to decomposing the phonemes of the generated speech, demonstrating the potential of voice reconstruction from unseen words. Our results imply the potential of speech synthesis from human EEG signals, not only from spoken speech but also from the brain signals of imagined speech.

#14 Evaluating and Improving Interactions with Hazy Oracles [PDF] [Copy] [Kimi]

Authors: Stephan J. Lemmer ; Jason J. Corso

Many AI systems integrate sensor inputs, world knowledge, and human-provided information to perform inference. While such systems often treat the human input as flawless, humans are better thought of as hazy oracles whose input may be ambiguous or outside of the AI system's understanding. In such situations it makes sense for the AI system to defer its inference while it disambiguates the human-provided information by, for example, asking the human to rephrase the query. Though this approach has been considered in the past, current work is typically limited to application-specific methods and non-standardized human experiments. We instead introduce and formalize a general notion of deferred inference. Using this formulation, we then propose a novel evaluation centered around the Deferred Error Volume (DEV) metric, which explicitly considers the tradeoff between error reduction and the additional human effort required to achieve it. We demonstrate this new formalization and an innovative deferred inference method on the disparate tasks of Single-Target Video Object Tracking and Referring Expression Comprehension, ultimately reducing error by up to 48% without any change to the underlying model or its parameters.

#15 Human-in-the-Loop Vehicle ReID [PDF] [Copy] [Kimi]

Authors: Zepeng Li ; Dongxiang Zhang ; Yanyan Shen ; Gang Chen

Vehicle ReID has been an active topic in computer vision, with a substantial number of deep neural models proposed as end-to-end solutions. In this paper, we solve the problem from a new perspective and present an interesting variant called human-in-the-loop vehicle ReID to leverage interactive (and possibly wrong) human feedback signal for performance enhancement. Such human-machine cooperation mode is orthogonal to existing ReID models. To avoid incremental training overhead, we propose an Interaction ReID Network (IRIN) that can directly accept the feedback signal as an input and adjust the embedding of query image in an online fashion. IRIN is offline trained by simulating the human interaction process, with multiple optimization strategies to fully exploit the feedback signal. Experimental results show that even by interacting with flawed feedback generated by non-experts, IRIN still outperforms state-of-the-art ReID models by a considerable margin. If the feedback contains no false positive, IRIN boosts the mAP in Veri776 from 81.6% to 95.2% with only 5 rounds of interaction per query image.

#16 Modeling Human Trust and Reliance in AI-Assisted Decision Making: A Markovian Approach [PDF] [Copy] [Kimi]

Authors: Zhuoyan Li ; Zhuoran Lu ; Ming Yin

The increased integration of artificial intelligence (AI) technologies in human workflows has resulted in a new paradigm of AI-assisted decision making, in which an AI model provides decision recommendations while humans make the final decisions. To best support humans in decision making, it is critical to obtain a quantitative understanding of how humans interact with and rely on AI. Previous studies often model humans' reliance on AI as an analytical process, i.e., reliance decisions are made based on cost-benefit analysis. However, theoretical models in psychology suggest that the reliance decisions can often be driven by emotions like humans' trust in AI models. In this paper, we propose a hidden Markov model to capture the affective process underlying the human-AI interaction in AI-assisted decision making, by characterizing how decision makers adjust their trust in AI over time and make reliance decisions based on their trust. Evaluations on real human behavior data collected from human-subject experiments show that the proposed model outperforms various baselines in accurately predicting humans' reliance behavior in AI-assisted decision making. Based on the proposed model, we further provide insights into how humans' trust and reliance dynamics in AI-assisted decision making is influenced by contextual factors like decision stakes and their interaction experiences.

#17 Learning Deep Hierarchical Features with Spatial Regularization for One-Class Facial Expression Recognition [PDF] [Copy] [Kimi]

Authors: Bingjun Luo ; Junjie Zhu ; Tianyu Yang ; Sicheng Zhao ; Chao Hu ; Xibin Zhao ; Yue Gao

Existing methods on facial expression recognition (FER) are mainly trained in the setting when multi-class data is available. However, to detect the alien expressions that are absent during training, this type of methods cannot work. To address this problem, we develop a Hierarchical Spatial One Class Facial Expression Recognition Network (HS-OCFER) which can construct the decision boundary of a given expression class (called normal class) by training on only one-class data. Specifically, HS-OCFER consists of three novel components. First, hierarchical bottleneck modules are proposed to enrich the representation power of the model and extract detailed feature hierarchy from different levels. Second, multi-scale spatial regularization with facial geometric information is employed to guide the feature extraction towards emotional facial representations and prevent the model from overfitting extraneous disturbing factors. Third, compact intra-class variation is adopted to separate the normal class from alien classes in the decision space. Extensive evaluations on 4 typical FER datasets from both laboratory and wild scenarios show that our method consistently outperforms state-of-the-art One-Class Classification (OCC) approaches.

#18 Frustratingly Easy Truth Discovery [PDF] [Copy] [Kimi]

Authors: Reshef Meir ; Ofra Amir ; Omer Ben-Porat ; Tsviel Ben Shabat ; Gal Cohensius ; Lirong Xia

Truth discovery is a general name for a broad range of statistical methods aimed to extract the correct answers to questions, based on multiple answers coming from noisy sources. For example, workers in a crowdsourcing platform. In this paper, we consider an extremely simple heuristic for estimating workers' competence using average proximity to other workers. We prove that this estimates well the actual competence level and enables separating high and low quality workers in a wide spectrum of domains and statistical models. Under Gaussian noise, this simple estimate is the unique solution to the MLE with a constant regularization factor. Finally, weighing workers according to their average proximity in a crowdsourcing setting, results in substantial improvement over unweighted aggregation and other truth discovery algorithms in practice.

#19 Beam Search Optimized Batch Bayesian Active Learning [PDF] [Copy] [Kimi]

Authors: Jingyu Sun ; Hongjie Zhai ; Osamu Saisho ; Susumu Takeuchi

Active Learning is an essential method for label-efficient deep learning. As a Bayesian active learning method, Bayesian Active Learning by Disagreement (BALD) successfully selects the most representative samples by maximizing the mutual information between the model prediction and model parameters. However, when applied to a batch acquisition mode, like batch construction with greedy search, BALD suffers from poor performance, especially with noises of near-duplicate data. To address this shortcoming, we propose a diverse beam search optimized batch active learning method, which explores a graph for every batch construction by expanding the highest-scored samples of a predetermined number. To avoid near duplicate beam branches (very similar beams generated from the same root and similar samples), which is undesirable for lacking diverse representations in the feature space, we design a self-adapted constraint within candidate beams. The proposed method is able to acquire data that can better represent the distribution of the unlabeled pool, and at the same time, be significantly different from existing beams. We observe that the proposed method achieves higher batch performance than the baseline methods on three benchmark datasets.

#20 Multi-Scale Control Signal-Aware Transformer for Motion Synthesis without Phase [PDF] [Copy] [Kimi]

Authors: Lintao Wang ; Kun Hu ; Lei Bai ; Yu Ding ; Wanli Ouyang ; Zhiyong Wang

Synthesizing controllable motion for a character using deep learning has been a promising approach due to its potential to learn a compact model without laborious feature engineering. To produce dynamic motion from weak control signals such as desired paths, existing methods often require auxiliary information such as phases for alleviating motion ambiguity, which limits their generalisation capability. As past poses often contain useful auxiliary hints, in this paper, we propose a task-agnostic deep learning method, namely Multi-scale Control Signal-aware Transformer (MCS-T), with an attention based encoder-decoder architecture to discover the auxiliary information implicitly for synthesizing controllable motion without explicitly requiring auxiliary information such as phase. Specifically, an encoder is devised to adaptively formulate the motion patterns of a character's past poses with multi-scale skeletons, and a decoder driven by control signals to further synthesize and predict the character's state by paying context-specialised attention to the encoded past motion patterns. As a result, it helps alleviate the issues of low responsiveness and slow transition which often happen in conventional methods not using auxiliary information. Both qualitative and quantitative experimental results on an existing biped locomotion dataset, which involves diverse types of motion transitions, demonstrate the effectiveness of our method. In particular, MCS-T is able to successfully generate motions comparable to those generated by the methods using auxiliary information.

#21 SwiftAvatar: Efficient Auto-Creation of Parameterized Stylized Character on Arbitrary Avatar Engines [PDF] [Copy] [Kimi]

Authors: Shizun Wang ; Weihong Zeng ; Xu Wang ; Hao Yang ; Li Chen ; Chuang Zhang ; Ming Wu ; Yi Yuan ; Yunzhao Zeng ; Min Zheng ; Jing Liu

The creation of a parameterized stylized character involves careful selection of numerous parameters, also known as the "avatar vectors" that can be interpreted by the avatar engine. Existing unsupervised avatar vector estimation methods that auto-create avatars for users, however, often fail to work because of the domain gap between realistic faces and stylized avatar images. To this end, we propose SwiftAvatar, a novel avatar auto-creation framework that is evidently superior to previous works. SwiftAvatar introduces dual-domain generators to create pairs of realistic faces and avatar images using shared latent codes. The latent codes can then be bridged with the avatar vectors as pairs, by performing GAN inversion on the avatar images rendered from the engine using avatar vectors. Through this way, we are able to synthesize paired data in high-quality as many as possible, consisting of avatar vectors and their corresponding realistic faces. We also propose semantic augmentation to improve the diversity of synthesis. Finally, a light-weight avatar vector estimator is trained on the synthetic pairs to implement efficient auto-creation. Our experiments demonstrate the effectiveness and efficiency of SwiftAvatar on two different avatar engines. The superiority and advantageous flexibility of SwiftAvatar are also verified in both subjective and objective evaluations.

#22 Human Joint Kinematics Diffusion-Refinement for Stochastic Motion Prediction [PDF] [Copy] [Kimi]

Authors: Dong Wei ; Huaijiang Sun ; Bin Li ; Jianfeng Lu ; Weiqing Li ; Xiaoning Sun ; Shengxiang Hu

Stochastic human motion prediction aims to forecast multiple plausible future motions given a single pose sequence from the past. Most previous works focus on designing elaborate losses to improve the accuracy, while the diversity is typically characterized by randomly sampling a set of latent variables from the latent prior, which is then decoded into possible motions. This joint training of sampling and decoding, however, suffers from posterior collapse as the learned latent variables tend to be ignored by a strong decoder, leading to limited diversity. Alternatively, inspired by the diffusion process in nonequilibrium thermodynamics, we propose MotionDiff, a diffusion probabilistic model to treat the kinematics of human joints as heated particles, which will diffuse from original states to a noise distribution. This process not only offers a natural way to obtain the "whitened'' latents without any trainable parameters, but also introduces a new noise in each diffusion step, both of which facilitate more diverse motions. Human motion prediction is then regarded as the reverse diffusion process that converts the noise distribution into realistic future motions conditioned on the observed sequence. Specifically, MotionDiff consists of two parts: a spatial-temporal transformer-based diffusion network to generate diverse yet plausible motions, and a flexible refinement network to further enable geometric losses and align with the ground truth. Experimental results on two datasets demonstrate that our model yields the competitive performance in terms of both diversity and accuracy.

#23 Collective Intelligence in Human-AI Teams: A Bayesian Theory of Mind Approach [PDF] [Copy] [Kimi]

Authors: Samuel Westby ; Christoph Riedl

We develop a network of Bayesian agents that collectively model the mental states of teammates from the observed communication. Using a generative computational approach to cognition, we make two contributions. First, we show that our agent could generate interventions that improve the collective intelligence of a human-AI team beyond what humans alone would achieve. Second, we develop a real-time measure of human's theory of mind ability and test theories about human cognition. We use data collected from an online experiment in which 145 individuals in 29 human-only teams of five communicate through a chat-based system to solve a cognitive task. We find that humans (a) struggle to fully integrate information from teammates into their decisions, especially when communication load is high, and (b) have cognitive biases which lead them to underweight certain useful, but ambiguous, information. Our theory of mind ability measure predicts both individual- and team-level performance. Observing teams' first 25% of messages explains about 8% of the variation in final team performance, a 170% improvement compared to the current state of the art.

#24 Learning to Select Pivotal Samples for Meta Re-weighting [PDF] [Copy] [Kimi]

Authors: Yinjun Wu ; Adam Stein ; Jacob Gardner ; Mayur Naik

Sample re-weighting strategies provide a promising mechanism to deal with imperfect training data in machine learning, such as noisily labeled or class-imbalanced data. One such strategy involves formulating a bi-level optimization problem called the meta re-weighting problem, whose goal is to optimize performance on a small set of perfect pivotal samples, called meta samples. Many approaches have been proposed to efficiently solve this problem. However, all of them assume that a perfect meta sample set is already provided while we observe that the selections of meta sample set is performance-critical. In this paper, we study how to learn to identify such a meta sample set from a large, imperfect training set, that is subsequently cleaned and used to optimize performance in the meta re-weighting setting. We propose a learning framework which reduces the meta samples selection problem to a weighted K-means clustering problem through rigorously theoretical analysis. We propose two clustering methods within our learning framework, Representation-based clustering method (RBC) and Gradient-based clustering method (GBC), for balancing performance and computational efficiency. Empirical studies demonstrate the performance advantage of our methods over various baseline methods

#25 Better Peer Grading through Bayesian Inference [PDF] [Copy] [Kimi]

Authors: Hedayat Zarkoob ; Greg d'Eon ; Lena Podina ; Kevin Leyton-Brown

Peer grading systems aggregate noisy reports from multiple students to approximate a "true" grade as closely as possible. Most current systems either take the mean or median of reported grades; others aim to estimate students’ grading accuracy under a probabilistic model. This paper extends the state of the art in the latter approach in three key ways: (1) recognizing that students can behave strategically (e.g., reporting grades close to the class average without doing the work); (2) appropriately handling censored data that arises from discrete-valued grading rubrics; and (3) using mixed integer programming to improve the interpretability of the grades assigned to students. We demonstrate how to make Bayesian inference practical in this model and evaluate our approach on both synthetic and real-world data obtained by using our implemented system in four large classes. These extensive experiments show that grade aggregation using our model accurately estimates true grades, students' likelihood of submitting uninformative grades, and the variation in their inherent grading error; we also characterize our models' robustness.